Airbnb Project

Exploratory Analysis

First, we will load in the necessary libraries.

library(ggplot2)
library(dplyr)
library(tidyverse)
library(ggmap)
library(knitr)
library(kableExtra)
library(corrplot)
library(scales)
library(RColorBrewer)
library(plotly)

Then, we will read in the csv file containing the data.

airbnbdf = read.csv(file = 'data/ab-nyc-2019.csv')

Now that we have the data loaded in, we will remove the columns that are unnecessary.

airbnbdf = subset(airbnbdf, select = -c(name, id, last_review, host_name))

The data contains some NA values that will have to be dealt with. The only column containing NA values is the “reviews_per_month” column and it is because the “number_of_reviews” column has 0 as its value for the rows corresponding with the NA values for “reviews_per_month”. In order to deal with it, the NA values will be replaced with 0’s.

naCols <- airbnbdf[colSums(is.na(airbnbdf)) > 0]
colnames(naCols)
## [1] "reviews_per_month"
na_values= airbnbdf[rowSums(is.na(airbnbdf)) > 0,] 
airbnbdf[is.na(airbnbdf)] <- 0

Check to see what the data looks like.

head(airbnbdf, 10)

Summary Stats

Summary stats of the dataset:

Summary of Stats
Average Price Median Price Minimum Price Maximum Price St Dev of Price Number of Reviews Total Number of Airbnbs
152.72 106 0 10000 240.15 1138005 48895

Summary stats of the dataset by borough:

Summary of Stats by Borough
Borough Average Price Median Price Minimum Price Maximum Price St Dev of Price Number of Reviews Total Number of Airbnbs
Bronx 87.50 65 0 2500 106.71 28371 1091
Brooklyn 124.38 90 0 10000 186.87 486574 20104
Manhattan 196.88 150 0 10000 291.38 454569 21661
Queens 99.52 75 10 10000 167.10 156950 5666
Staten Island 114.81 75 13 5000 277.62 11541 373

Scatterplot of the airbnbs in the state using the longitude and latitude provided for each location:

Since the scatterplot can be a bit crowded, a density map will be better:

Density map for each borough:

Insights

Bargraph of number of airbnbs per borough:

Bargraph of the average price of an airbnb per borough:

Bargraph of the count of each kind of airbnb per borough:

Bargraph of type of room vs mean price:

Bargraph of the number of reviews per borough:

Scatterplot of the number of reviews vs price of airbnb. According to this scatterplot, there does not seem to be any correlation between the number of reviews an airbnb has and its price.

Correlation Matrix. According to the correlation matrix, none of the variables are correlated.